NiW: Converting Notebooks into Workflows to Capture Dataflow and Provenance

نویسندگان

  • Lucas A. M. . C. Carvalho
  • Regina Wang
  • Yolanda Gil
  • Daniel Garijo
چکیده

Interactive notebooks are increasingly popular among scientists to expose computational methods and share their results. However, it is often challenging to track their dataflow, and therefore the provenance of their results. This paper presents an approach to convert notebooks into scientific workflows that capture explicitly the dataflow across software components and facilitate tracking provenance of new results. In our approach, users should first write notebooks according to a set of guidelines that we have designed, and then use an automated tool to generate workflow descriptions from the modified notebooks. Our approach is implemented in NiW (Notebooks into Workflows), and we demonstrate its use by generating workflows with third-party notebooks. The resulting workflow descriptions have explicit dataflow, which facilitates tracking provenance of new results, comparison of workflows, and sub-workflow mining. Our guidelines can also be used to improve understandability of notebooks by making the dataflow more explicit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dataflow Notebooks: Encoding and Tracking Dependencies of Cells

Computational notebooks have seen widespread adoption among scientists in many fields, and allow users to view interactive graphical results inline, to embed text and code together, to organize code into cells, and to selectively edit and re-execute cells. Because they allow quick and recordable analyses, they play an important role in documenting experiments. However, the reproducibility of no...

متن کامل

A Graph Model of Data and Workflow Provenance

Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of c...

متن کامل

Atomicity and provenance support for pipelined scientific workflows

Today many significant scientific discoveries are achieved through complex and distributed scientific computations that are structured and represented as scientific workflows. Although atomicity is a well studied topic in transaction processing and business workflows, such an important capability needs to be revisited in a scientific workflow environment. Firstly, the semantics of atomicity nee...

متن کامل

A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

Scientific workflows have gained great momentum in recent years due to their critical roles in e-Science and cyberinfrastructure applications. However, some tasks of a scientific workflow might fail during execution. A domain scientist might require a region of a scientific workflow to be “atomic”. Data provenance, which determines the source data that are used to produce a data item, is also e...

متن کامل

Mapping the NRC Dataflow Model to the Open Provenance Model

The Open Provenance Model (OPM) has recently been proposed as an exchange framework for workflow provenance information. In this paper we show how the NRC data model for workflow repositories can be mapped to the OPM. Our mapping includes such features as complex data flow in an execution of a workflow; different workflows in the repository that call each other; and the tracking of subvalues of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017